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Abstract 

In this paper we describe a dynamic external memory data structure that supports range 
reporting queries in three dimensions in 0(\og 2 B N + ~) I/O operations, where k is the number 
of points in the answer and B is the block size. This is the first dynamic data structure that 
answers three-dimensional range reporting queries in log^ 1 ^ N + 0(-j|) I/Os. 

1 Introduction 

The orthogonal range reporting problem is to maintain a set of points 5 in a data structure so 
that for an arbitrary query rectangle Q all points in Q n S can be reported. This is a fundamental 
problem with several important applications, such as geographic information systems, computer 
graphics, and databases. In this paper we present a dynamic external-memory data structure that 
supports three-dimensional range reporting queries in 0{\og 2 B N + -g) I/O operations and updates 
in 0(log 2 N) I/O operations, where k is the number of reported points and N is the number of 
points in the data structure. 

In the external memory model the data is stored in disk blocks of size B, a block can be read into 
internal memory from disk (resp. written from internal memory into disk) with one I/O operation, 
and computation can only be performed on data stored in the internal memory. The space usage 
is measured in the number of blocks, and the time complexity is measured in the number of I/O 
operations. A more detailed description of the external memory model can be found in e.g. |21| 
or Since we are interested in minimizing the number of I/O operations, an efficient data 

structure should support queries in log2 (1) N + 0(|) I/O operations. 

In the RAM computation model, there are both static and dynamic data structures that 
use iVlog° (1) N space and support d-dimensional orthogonal queries in 0(log 2 N + k) time; see 
e.g., [3] for a survey of previous results. In the external memory model, these results can be 
matched only in two dimensions (dynamic data structure) and three dimensions (static data struc- 
ture). The dynamic data structure of Arge et al. [9] uses O ( (N/B) log 2 N/ log 2 log# N) blocks of 
space and supports two-dimensional range reporting queries and updates in 0(\og B N + -g) and 
0(\og B iV(log 2 N/ log 2 \og B N)) I/O operations respectively. The static data structure of Vengroff 
and Vitter [22l [21] supports three-dimensional range reporting queries in 0(log B N + -g) I/Os 
and uses 0{{N/B)\og\N) blocks of space. The space usage of a three-dimensional data struc- 
tures was improved by Afshani [1] and Afshani, Arge, and Larsen [2] to 0((N/B) log 2 N) and 
0((iV/ J B)(log 2 iV/log 2 log i jA r ) 3 ) blocks respectively, see Tabled! The query cost can be improved 
if all point coordinates are positive integers bounded by a parameter U [T5J [TBI CQ) an d the space 
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Table 1: Upper bounds for orthogonal range reporting in RAM and external memory models in 
two and three dimensions. Only dynamic results in the RAM model are listed. For comparison, 
the space usage of data structures in the RAM model is specified in blocks of size B. We denote 
by u and e arbitrary constants such that e > and oj > 7/8; our result is marked with an asterisk. 



usage can be reduced for some special cases of orthogonal queries, such as dominance queries; we 
refer the reader to [U [2] for a more detailed description of special cases and to [7] for an extensive 
description of previous results. 

Using range trees with fan-out log e N [5], we can transform a two-dimensional data struc- 
ture into a data structure that supports cf-dimensional orthogonal queries, so that the cost of 
queries and updates increases by a 0(log 2 N/ log 2 log 2 N) factor for each dimension and the space 
usage increases by a factor 0(log 2 +e N) for each dimension. The recent (static) dimension reduc- 
tion technique of [2] increases the cost of queries by 0(log 2 N/ log 2 log B A) factor and the space 
usage also by a 0(log 2 A/ log 2 \og B A) factor. These techniques can be used to obtain three- 
dimensional data structures that support queries with 0{\og B A(log 2 N/ log 2 log 2 A) + -g) and 
0{\og B JV(log 2 A/ log 2 log B N) + -g) I/Os respectively; see Table [TJ However, these data struc- 
tures do not achieve 0(log B N) query bound for any B and a constant c. In the case when 
B = fi((log 2 N)fW) for some function /(A) = 0(1), we need f2(/(iV) log B N) + 0(|) operations 
to answer queries using the combination of [9] and [5] or the result of [2]. We also don't know if 
there are efficient (static or dynamic) data structures for range reporting in d > 4 dimensions that 
report all points with log^ 1 ^ A + 0(|j) operations. 

In this paper we describe a data structure that uses 0(§ log! A log! B) blocks of space, sup- 
ports updates in 0(log 2 A) amortized I/Os, and answers three-dimensional orthogonal range re- 
porting queries in 0{\og 2 B N + 4) I/Os. Thus our result "matches" the query complexity of the 
dynamic RAM data structure of [13J. Moreover, the space usage of our data structure differs by 
a 0(log 2 i?(log 2 \og B A) 3 / log 2 A) factor from the best previously known external memory static 
data structure [2J. Hence, when B is not very large, i.e., when log 2 B = o(y / log 2 A/ (log 2 \og B A) 3 ), 
our dynamic data structure uses less space than the static data structure of [2J . 

In section [2] we describe the dynamic data structure that supports dominance queries in 
0(i|) I/Os when the set S contains 0(i? 4 / 3 ) points. Our data structure maintains 0(log 2 i?) t- 
approximate boundaries of [22], that will be defined in section [2 We show that each ^approximate 
boundary can be constructed with 0{B\og 2 B) I/O operations for > B and a small set S. The 
cost of re-building the data structure is distributed among 0(i? 4//3 ) updates with the lazy updates 
approach: the newly inserted and deleted points are stored in two buffers for each i-approximate 
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boundary, and each i-approximate boundary is re-built when one of its buffers contains the sufficient 
number of points. We further improve the update time by showing how to store only two buffers 
for all boundaries. The trick of storing inserted (deleted) points for different boundaries in the 
same buffer may be of independent interest. Using standard techniques, more general orthogonal 
range queries can be reduced to dominance queries as described in section 12.11 

In section [3] we describe the data structure that supports (2, 1, 2)-sided queries Q = [a,b] x 
[c, +oo) x [d, e] on a set of points S such that p.z = O(B^) for a small constant / and for any 
p € S. Here and further we denote by p.x, p.y, and p.z the x-, y-, and z-coordinates of a point p. 
The main idea of section [3] is to store points in a data structure T that is similar to the external 
memory priority search tree, but contains three-dimensional points. The data structure for small 
sets from section 12.11 is used to guide the search in each node of T ■ The data structure that 
supports arbitrary (2, 1, 2)-sided queries is described in section [H The data structure is based on a 
range tree with fan-out Q(B-F) for a small constant / that is built on z-coordinates of points. The 
main idea of section [J] is to store the data structure F v of section [3] in every node v of the range 
tree. The ^-coordinate of each point p in F v is replaced with an index bounded by Q(B^) that 
indicates which child of the node v contains p. We show how a general (2, 1, 2)-sided query can be 
reduced to 0(log B N) queries to data structures F v . Finally, we can obtain a data structure for 
general three-dimensional queries from the data structure for (2, 1, 2)-sided queries using standard 
techniques. 

Thus our approach is based on a combination of some previously known techniques with some 
novel ideas. In particular we believe that the data structures described in sections [3] and [4] and the 
general decomposition of the three-dimensional range reporting problem into subproblems are new. 

2 Dominance Reporting for Small Sets 

A point q dominates a point p if all coordinates of q are greater than or equal to the respective 
coordinates of p. The dominance reporting query is to report all points p € S that dominate a 
query point q. A three-dimensional dominance reporting query is equivalent to reporting all points 
in a product of three half-open intervals. In this section we describe a dynamic data structure that 
contains 0(S 4 / 3 ) elements and supports dominance reporting queries and updates. The main idea 
of this data structure is that the t-approximate boundary [22] for a small set of elements can be 
efficiently maintained under insertions and deletions. 

Overview. A three-dimensional ^approximate boundary was introduced by Vengroff and Vit- 
ter [22] . A i-approximate boundary for a three-dimensional set S is a surface V that satisfies the 
following properties: (1) V divides the space, i.e. every point either dominates a point on V or 
is dominated by a point of V; (2) every point of V is dominated by at least t and at most 3t 
points of S. An example of a i-approximate boundary constructed with the algorithm of [22] is 
shown on Fig. [TJ There are 0(15*1) points on V called inward corners, such that every point on V 
dominates an inward corner and an inward corner does not dominate any point on V (except of 
itself). There is a linear space data structure that finds an inward corner c of V that is dominated 
by a query point q, if such inward corner c exists, and reports all points of S that dominate c in 
0(log B (\S\) + t/B) I/Os. We maintain (log 2 -B)/6 t-approximate boundaries Vi, V2, • • • , V s , where 
Vi is a B ■ 2 2 *-approximate boundary. Given a query point q, we examine Vi, V2, . . . , Vi and find the 
minimal index i, such that q dominates an inward corner Cj of Vj using the method described in [22 1. 
We can test each Vj in 0{\og B B 4 / 3 ) = 0(1) I/Os and find the index % in 0(i) I/Os. If q dominates 
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Figure 1: An example of a t-approximate boundary. The points of the set S are not shown. Ridges 
R' 2 , R' 3 , -R4, and R' 5 are drawn with dotted lines. Ridges R±, R2, R3, and R4 are drawn with solid 
lines. A, B, C, D, E are examples of inward corners. X, Y, Z, and W are examples of in-corners; X 
belongs to ridge R\, and Y, Z, and W belong to ridge R3. 

an inward corner Cj of Vj but does not dominate any point on Vj_ 1, then q is dominated by @(2 2l B) 
points of S. Since q dominates Cj, all points that dominate q also dominate Cj. Hence, we can 
examine the list of points that dominate Cj and report all points that dominate q in 0(2 2 *) = O(-jy) 
I/O operations. Thus the total query cost is 0(-§). See [22] for a more detailed description. 

We can construct a i-approximate boundary Vi with O(B) I/O operations if S contains 0(i? 4//3 ) 
points and t > B; the algorithm is described in section [5j In the next part of this section we show 
how the data structure for a small set of points can be dynamized by distributing the construction 
cost among @(B) update operations. This is achieved by storing buffers with newly inserted and 
deleted points and periodically rebuilding the data structure. Then, we show that we can support 
update operations in 0(1) I/Os on the data structure that consists of 0(log 2 -E>) boundaries by 
storing one buffer with recently inserted points and one buffer with recently deleted points for all 
t-approximate boundaries. 

Deletion-only Data Structure. A t-approximate boundary Vi supports lazy deletions in 0(1) 
amortized I/O operations. When a point p is deleted, we simply add it to a list V of deleted 
elements that may contain up to 2 2% ~ l B points. Let T be the list of points that dominate a query 
point q; we can obtain T in 0(^-) I/Os as described in the beginning of this section. We can sort T 

in log B \T\) = 0(2 2l log B (2 2l B)) = 0{2 2t ) I/Os (we assume that each point in S has a unique 
integer identifier). We can also sort T> in 0(2 2 *) I/Os. Then, we traverse T and T> and remove 
from T all points that occur in V in 0( |T| + |P| ) = 0(2 2i ) I/Os. Since we use H when -| = n(2 2i ), 
the query cost remains unchanged. When the number of deleted points in T> equals to B ■ 2 2l /2, 
we re-build the data structure for Vi without deleted points in 0(B) I/Os and empty the list V. 
Supporting Insertions. Insertions can be supported with a similar technique. Inserted points are 
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stored in the list of new points X that may contain up to 2 2l ~ 1 B points. When a point p is deleted, 
we add it to a list V of deleted points as described above. If a point p stored in X is deleted, we 
simply remove p from X. When X contains 2 2l ~ 1 B points, we re-build the data structure for Vi- To 
answer a query, we examine all points from T that do not belong to T> in O(-g) I/Os as described in 
the previous paragraph. Then, we traverse the list X and report all point that dominate the query 
point in 0(2 2i ~ 1 ) = 0(|) I/Os. 

Updates with 0(1) Cost. Since our data structure consists of 0(log 2 B) boundaries Vj, the total 
cost of an update is 0(log 2 B). We can reduce the amortized update cost to a constant by storing 
new inserted points for all boundaries in one list X and deleted points for all boundaries in one list 
T>. An array D stores pointers to elements of T>, such that all elements between D[i] and the end 
of T> are removed from the data structure for Vj. An array / stores pointers to elements of X, such 
that all elements between I[i] and the end of X are new elements that are not yet inserted into 
the data structure for Vi. The pointer end(X>) (end(X)) points to the last (in chronological order) 
deleted (inserted) element stored in T> (X). Both T> and X also contain one additional dummy 
element Id (resp., li) that follows end(X>) (resp., end(X)). When a new point p is inserted, we store 
p in the //, set the pointer end(X) so that it points to //, and append a new dummy element after 
end(X). A deleted element is appended at the end of T> with the same procedure. After 2 2t ~ 1 B 
deletions we rebuild the data structure for Vi without deleted elements and change D[i] so that it 
points to Id- After 2 2t ~ l B insertions we rebuild the data structure for Vi with new elements and 
change I[i] so that it points to //. After 0(log 2 -B • B) updates, we re-build the data structures 
for all Vi as well as the lists X and T>. This incurs an amortized cost O(l). The total cost of 
re-building data structures and (pointers to) lists V and I in a sequence of £> 4 / 3 update operations 
is 0(Yy~j = Q 2 T ~iB) = 0(B i / 3 ), where r = log 2 -B/3 + const is the index of the last i-approximate 
boundary Vj. We can report all points that dominate an inward corner of Vi in 0{2 2t ) I/Os as 
described above. Hence, dominance queries can be supported in O(-g) I/Os. This result can be 
summarized in the following Lemma. 

Lemma 1 Elements of a set S such that \S\ = 0(-B 4 / 3 ) can be stored in a data structure that uses 

SI h 

0(^log 2 |»S|) blocks of space and supports dominance queries in O(-g) I/O operations and updates 
in 0(1) I/O operations amortized. 

2.1 (1, 1,2)- and (2, 1, 2)-Sided Queries for Small Sets 

Suppose that b x , b y , and b z are natural constants such that 1 < b x ,b y ,b z < 2. We say that a 
query Q is a (b x , b y , 6^)-sided query if the projection of Q on the cc-axis is bounded on b x sides, the 
projection of Q on the y-axis is bounded on b y sides and the projection of Q on the z-axis is bounded 
on b z sides. Thus the projection of Q on the x-axis (resp., y- or z-axis) is a an infinite half-open 
interval if b x (resp., b y or b z ) equals 1, and the projection of Q on the x-axis (resp., y- or z-axis) 
is a finite closed interval if b x (resp., b y or b z ) equals 2. Dominance queries considered in section [2] 
are equivalent to (1, 1, l)-sided queries. Using a standard reduction [HI [20], we can transform 
a 0(s(iV)) space data structure that supports (1, 1, l)-sided queries in 0(t(N) + k/B) time and 
updates in 0(u(N)) time into a 0(s(N) log™ N) space data structure that supports (b x ,b y ,b z )- 
sided queries in 0(t(N) + ^) time and updates in 0(u(N) log™ N) time; here m = b x + b y + b z — 3. 
Applying this transformation to Lemma HJ we obtain the following result. 

Lemma 2 Let 1 < b x ,b y ,b z < 2 and m — b x ~\~ b y -\- b z — 3. Elements of a set S such that 
\S\ = 0(B A l 3 ) can be stored in a data structure that uses 0(^ log™ -1-1 \S\) blocks of space and sup- 
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ports (b x ,b y ,b z )-sided queries in O(J^) I/O operations and updates in 0(log™(\S\)) I/O operations 
amortized. 

In particular, we can support (2, l,2)-sided queries in 0(-§) I/Os and updates in 0(log§B) I/Os 
on a set S that contains 6(5 4 / 3 ) points using a data structure that needs 0(B 1 ^ 3 log!* B) blocks of 
space. 

3 Extended Three-Sided Queries 

In this section we describe a data structure that supports (2, 1, 2)-sided reporting queries when 
z-coordinates of all points are positive integers bounded by Q(B-F), p.z = Q(B^) for all points 
p € S. Here / is a constant such that / < 1/6 

Data Structure. Our data structure is a modification of the external memory priority search 
tree [9]. The (external) priority search tree is a tree built on x-coordinates of two-dimensional 
points. A point stored in a leaf is associated with an ancestor of I or with I itself, so that the 
following property is guaranteed: points associated with a node v have larger y-coordinates than 
points associated with descendants of v. The main idea of our modification is to maintain this 
property for every possible value of the z-coordinate. Then, we maintain the data structure of 
section [27T1 in each tree node and use it to guide the search, i.e., to decide which descendants of a 
node must be visited. 

We construct a tree T with fan-out Q(B^) on the set of x-coordinates of all points. We store 
B(B 1+ f) values, i.e., x-coordinates of <d(B 1+ f) consecutive points of S, in each leaf node. The 
range of an internal node v is an interval rng(v) = [a v ,b v ], where a v and b v are the smallest and 
the largest values stored in the leaf descendants of v. 

We associate a set of points S v with each node v of T ■ Sets S v can be constructed by visiting 
nodes of T in pre-order. For the root r of T, let L r be the set of all points in S sorted in increasing 
order by their y-coordinates, and let L r \j] be the set of all points p € S, p.z = j, sorted in increasing 
order by their y-coordinates. The set S r [j] contains the last B points of L r [j], i.e., B points with 
largest y-coordinates. For each non-root node v of T, the list L v contains all points p such that p.x 
belongs to the range of v and p does not belong to any S w , where w is an ancestor of v; points in 
L v are sorted in increasing order by their y-coordinates. The list L v [j] contains all points p € L v 
such that p.z = j. If v is an internal node, the set S v \j] contains the last B points of L v [j]. If v is 
a leaf, then S v [j] contains all points from Note that L v [j] and S v [j] may contain less than B 

points or even be empty for some j. The set S v is the union of all sets S v \j], S v = UjS v [j]. For any 
node v, \S V \ = 0(B l+ f) The set S' v contains at most one point from each set S v [j]. If \S v [j] = B, 
then S' v contains the point p € S v [j] with minimal y-coordinate; otherwise S' v contains no points 
from S v [j]. 

We store data structures D v and D' v in each internal node v of T ■ The data structure D v 
contains all points of S Vt for every child of v, and the data structure D' v contains all points of 
S' v . for every child V{ of v. Thus D v contains 0(B 1+2 f) points, and D' v contains 0(B) points. By 
Lemma[2j D v and D' v can be stored in 0(B 2 ? logf B) and 0(log2 B) blocks respectively and support 
(2, l,2)-sided queries in O(l) I/O operations. In every node v of T, we also store a data structure 
E v that contains all points of S v and supports (2, l,2)-sided queries. Note that lists L v and L v \j] 
and sets S v \j] are not stored in the data structure; we only use them to simplify the description. 
Search Procedure. Given a query Q = [a,b] x [c, +oo) x [d,e], we identify leaves l a and Z&: l a 
contains the smallest value that is greater than a and /& contains the largest value that is smaller 
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than b. Let 7r a and TXb denote the paths from the root of T to l a and lb respectively. Let tt = n a U Tib 
denote the set of all nodes of T that belong to Ti a or 717,. Every point p G S such that p.x G [a, b] is 
stored in some set S v such that either v belongs to tt or v is a descendant of a node that belongs 
to TT. 

We can visit all nodes v G tt and report all points in S v n Q in 0(log B iV) I/Os using data 
structures £"„ (we ignore the time needed to output points). Points in descendants of v G tt can be 
found using the following Property. 

Fact 1 Let v' , v' G" tt, be a child of a node v € it, and let w be a descendant of v' . If S w \j] PI Q 7^ 0, 
then \S pax f w -\\j] fl Q\ = B where par(u>) denotes the parent of a node w. 

Proof: Recall that Q = [a,b] x [c, +00) x [d, e]. For a child v' of v, such that v' G" tt, either 
rng(v') fl [a, b] = or rng(v') C [a, b]. Hence, Fact [1] is non-trivial only in the case when j G [d,e] 
and rng(v') C [a, b]. In this case a point p € ^[j] (resp., p G <Spar(w)b1) belongs to Q if an d only 
if P-y > c. Suppose that some p G belongs to Q. Since p.y > c and p'.y > p.y for any point 

p' G "Spar^) [j] j an points G 5 , p ar ( w ) [j] belong to Q. The set S par ^[j] contains B points because 
S w [j] is not empty. □ 

Consider a node v, such that v G 7r a and w G" 7r&. Suppose that the i-th child vi of v belongs 
to 7r a and rng(vi + i) = [a' ,b'\. We define the query Q v = [a',b] x [c, +00) x [d, e]. For any point p 
stored in a descendant w of u, such that w ^ 7T a , queries Q„ and Q are equivalent: p belongs to 
Q if and only if p belongs to Q v . Points in S w fl Q = S w fl Q v for all descendants w of v, w Tr a , 
can be reported with the following recursive procedure. We report all points in Q v n S Vi for all 
children vi of v using the data structure D v . All children of v, such that Qt, fl S Vi [j] contains at 
least B points for at least one j, can be identified using D' v . We visit all such non-leaf nodes Vi and 
recursively call the same procedure. 

Our procedure reports all points in S w CiQ v : Suppose that S w [j] n Q v 7^ for some w and j. 
Then S par .( w } [j] fl Q v contains B points by Fact [TJ Hence, the parent of w will be visited and all 
points in S w n Q v will be reported by querying the data structure D pai ( w y If k v is the total number 
of points in S w D Q v for all w, then the search procedure takes O(^) I/O operations: Queries 
answered by D w and D' w in every visited node w take O(l) I/O operations and a node ui is visited 
only if \S W [j] (1 Q v \ = B for at least one value of j. 

All points in S w n Q for all descendants w of a node w, such that v (£ ir^ but v G" vr a or u is the 
lowest common ancestor of l a and lb, can be found with the same procedure. The only difference is 
that the query Q v is defined differently: if v G irb, v G" vr a , and the i-th child V{ of u belongs to -Kb, 
then Q,, = [a, b'] x [c, +00) x [d,e] where rng(vi-i) = [a',b']. If v is the lowest common ancestor of 
l a and lb, then u G 7r a and u G 7Tb. Suppose that Uj G vr a and vi G 7T& where v% and V; are the children 
of v. Then = [a', b"] x [c, +00) x [d, e] where rng(vi + i) = [a 1 , b'] and rng(vi-\) = [a" , b"\. Hence, 
a query Q can be answered with 0(\og B N + -g) I/O operations. 

Space Usage and Updates. Every data structure D v contains 0{B l+2 f ) points and can be 
stored in 0{B 2 ^ logi] B) blocks of space. Every D' v contains 0(B 2 f) points and can be stored in 
0(log|-B) blocks. There are 0( B ^_ i f ) internal nodes in T; hence, all D v and D' v use 0(^-log|i?) 
blocks. Every data structure E v contains 0(B 1+ f) points. Since the total number of nodes is 
0{ J^ +f ), all E v can be stored in O^logfi?) blocks. 

When a point p is inserted into S, we identify the leaf l v in which p.x must be stored and traverse 
the path tx v from l p to the root until we find a node v such that p.y < m v .y and m v is the point 



7 



with maximal y-coordinate in ^[p.z]. Then, we insert p into S v \p.z]. Now ^[p.z] may contain 
B + l points; if = B + l, the point s v with the smallest y-coordinate must be removed from 

S v [p.z]. We insert the point s v into S^fp.-z] where vi is the child of v such that vi belongs to ir p . If 
S Vi [p.z] contains B + l points, we move the point with the smallest y-coordinate from S Vi \p.z] to 
S u [p.z] where u is the child of Vi, u € tt p . The procedure continues until S u \p.z] contains at most 
B points or the leaf l p is reached. In every node u visited by the insertion procedure, one point is 
inserted into S u and at most one point is deleted from S u . Hence data structures E u , D w , and D' w , 
where w denotes the parent of u, can be updated in 0(log 2 -B) I/Os. Since 0(log B N) nodes are 
visited, insertion takes 0(log 2 iVlog 2 B) I/O operations. Deletions can be supported with a similar 
procedure. 

It remains to show how the tree T can be re-balanced after update operations, so that the 
height of T is Oi\og B N). We implement the base tree T as a WBB-tree [10] with leaf parameter 
ni = B l+l /f and branching parameter n& = B 1 ^ , In a WBB tree with this choice of parameters the 
following invariants are maintained: each leaf contains between B l+l / f and 2B l+1 l f — 1 values and 
for each internal node v on level h (counting from the lowest level) there are between 51+^+1)// /2 
and 2B l+ ^ h+1 ^ f — 1 values stored in leaf descendants of v. It is also shown in [ID] that internal 
node has between B 1 ^ /4 and iB 1 ^ children. Hence, the height of T is Oi\og B N). 

If the invariants of a WBB tree are violated after an insertion, i.e., if a node v on level h contains 
2_B 1 +(' l + 1 )// values (resp., v contains 2B 1+1 ^ values if v is a leaf), then we split the node v into v' 
and v" that contain (i? 1+1 /^) values each. Splitting a node does not affect the children 

of v, i.e., every child of v becomes the child of v' or v" after splitting. It can be shown [10] that a 
node v on level h is split at most once when a sequence of B 1+ ( h+1 ^f /2 values is inserted into leaf 
descendants of v. See [10J for a complete description of the splitting procedure. 

When a node v is split into v' and v", data structures in nodes v', v", and in their descendants 
may change. Since S v \j] = S v > U S v " for each j after the split operation, at least one of S v /\j] and 
S v "[j] contains less than B points. Suppose that for some j, the set S v >[j] contains less than B 
elements. If S Vi \j] 7^ for at least one child Vi of v', then some points must be moved from sets 
S Vt [j] into S v [j], where v% is a child of v' . Let d v = min(| U S Vt \j]\,B — \S V \). We can identify 
d v points with largest y-coordinates in US Vt , using D v i and insert those points into S v i[j\. Data 
structures E v , D w , and D' w where w is a parent of v' are updated accordingly. If d v > 0, we 
recursively check sets S Vt for all children vt of v' . Data structures stored in the node v" and the 
descendants of v" are processed in the same way. Each point is moved only once and the total 
number of moved points does not exceed the total number of values stored in leaf descendants of 
v' and v" . When a point is moved, all affected data structures can be updated in 0(log2 5) I/Os. 
The number of values stored in a node v on level h and all its descendants is 0(B 1+ ( h+1 ^f). Since 
v is split at most once after /2 operations, the amortized cost for splitting a node is 

0(log2i?). Every leaf has 0(\og B N) ancestors; hence, the total amortized costs of splits incurred 
by an inserted point is 0(log 2 -Blog 2 N). Thus the total cost of an insertion is 0(log 2 iVlog 2 B). 

We implement deletions with the lazy deletions approach. Suppose that a point p such that 
p.x is stored in a leaf l p is deleted from S. Then we mark the value p.x as deleted in L. When 
N/2 values stored in leaves of T are marked as deleted, we rebuild the tree T and all secondary 
data structures. This can be done in 0(iVlog 2 B) I/O operations. Hence, rebuilding after deletions 
incurs an amortized cost of 0(log 2 B). 

The result of this section is summed up in the following Lemma. 

Lemma 3 There exists a 0(^\og\B) space data structure that supports extended three-sided 
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queries in 0(\og B N + 4) I/O operations and updates in 0(log 2 iVlog 2 B) I/O operations. 

4 Range Reporting in Three Dimensions 

Using range trees with fan-out Q(B-f), we can transform the result of section [3] into a data structure 
for (2, 1, 2)-sided queries. For completeness, we sketch the data structure below. 

We construct an external memory range tree on ^-coordinates of the points in a set S: z- 
coordinates of all points are stored in leaves of the tree; each leaf contains &(B) values and each 
internal node has Q(B* ) children. We denote by R v the set of points whose z-coordinates are stored 
in leaf descendants of the node v. The data structure F v contains one point for each point p £ R v . 
If p = (p.x,p.y,p.z), p € R v , is also stored in the i-th child Vi of v, then F v contains the point 
p' = (p.x,p.y,i). In other words, we replace the z-coordinate of each point p G R v with an index 
i £ [1, Q(B*)], such that p G R Vi . F v supports (2, 1, 2)-sided queries as described in Lemma [3l 

For each internal node v, let int(v,i,j) denote the interval [min^maxj] where minj denotes the 
minimal value stored in a leaf descendant of the i-th child of v, and maxj denotes the maximal 
value stored in a leaf descendant of the j-th child of v. For a query Q = [a, b] x [c, +oo) x [d, e], 
we can represent the interval [d,e] as a union of 0(\og B N) intervals int(v,gi,gj). Hence, Q can be 
answered by answering 0(\og B N) queries of the form [a, b] x [c, +oo) x int(v, gi, gj). Every such 
query can be answered by the data structure F v . Hence, a (2, l,2)-sided query can be answered 
with 0{\og 2 B N + 4) I/O operations. Since each point is stored in 0(log B N) data structures F v , 
the space usage and update cost increase by a factor 0(log B N) compared with the data structure 
of Lemma [3l 

Lemma 4 There exists a 0{/g log 2 iVlog 2 B) space data structure that supports (2,l,2)-sided 
queries in 0(log^ N + J^) I/Os and updates in 0(log2 N) I/Os amortized. 

Finally, we apply the reduction described in section 12.11 and obtain the main result of this paper. 
The space usage and update cost increase by a factor 0(log 2 N) in comparison with Lemma H] 

Theorem 1 There exists a 0(Jg log 2 N log 2 B) space data structure that supports three-dimensional 
orthogonal range reporting queries in 0(\og 2 B N + 4) I/O operations and updates in 0(log|iV) 
amortized I/O operations. 

5 Construction of a t- Approximate Boundary. 

We describe below a (slightly simplified) variant of the construction algorithm from [22] for the 
case when all points have different x-, y-, and ^-coordinates. The algorithm constructs a series 
of ridges in order of descending z-coordinates. The ridge Rq consists of a single point (0, 0, z max ), 
where z max is the maximum ^-coordinate of a point in S. During the i-th iteration, % = 1, . . ., the 
ridge Ri is constructed as follows. We move down Ri-i until some point on Ri-i is dominated by 
2>t points or Ri-i hits the (x,y)-plane; the new position of R4-1 is denoted by R[. Let p be the 
point of R[ that lies on the (x, z)-plane. We move p in the +x direction until p is dominated by 2t 
points of S. Then, the following loop is repeated until p hits the (y, z)-plane: (1) the y-coordinate 
of p is increased until p is dominated by t points (2) the ^-coordinate of p is decreased until p is 
dominated by 2t points or p hits R\ (3) if p hits the ridge R[, p follows R\ until it hits the (y,z) 
plane or p is dominated by 2t points. The ridge Ri is constructed when p hits the (y, z)-plane. 
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Positions of p before the loop begins and at the end of step (2) are called inner corners of V. Points 
on R\ with the same (x, y)-coordinates as some inner corner on Ri-i are also called inner corners. 
If p is an inner corner of some R[ but p is not an inner corner of Ri, then p is an inward corner. 
As described above, if all inward corners of a i-approximate boundary V are known, then we can 
determine whether a query point q dominates some point of V. 

A t-approximate boundary for a set S consists of O(^P) ridges: since some point of each R[ 
except of the lowest one is dominated by 3t points of S and each point of Ri-i is dominated by 2t 
points, there are at least t points with z-coordinates between R4-1 and R' { . The number of inner 
corners on each ridge is also O(^): suppose that during step (2) point p moves from position q to 
position r, i.e. q is reached during the previous step (1) and r is the inner corner. Then there are t 
points whose x-coordinates are between r.x and q.x. If t > B and \S\ < B A I 3 , then the number of 
ridges in a t-approximate boundary and the number of inner corners in each ridge is 0(B 1 / 3 ). We 
can use this to construct the data structure for a i-approximate boundary with 0(B) operations. 

Lemma 5 // \S\ = 0(B 4 / 3 ) and t > B, then a t-approximate boundary for S can be constructed 
with 0(B 2 / 3 ) I/O operations. 

Proof: All points of S are sorted in decreasing order by their ^-coordinates and stored in a list 
L. Suppose that the ridge R4 is already constructed. We store the inner corners of Ri in the data 
structure TZ. The number of elements in TZ is 0(B l l 3 ); hence, TZ can be stored in the main memory. 
For every element e of TZ we store the number of already processed points in L whose projections 
on the (x,y)-plane dominate the projection of e on the (x,y)-plane. Processed points have higher 
z-coordinates than the current position of Ri. We read the next B points from L and look for the 
highest point p, such that some e <G R is dominated by 3i points q <G L with q.z > p.z. If there is 
no such p, we modify the data structure TZ, decrease the z-coordinate of Ri, and read the next B 
points from L. This step is repeated until we find a point p dominated by 3t points. When p is 
found, the z-coordinate of Ri is set to p.z. Then we set R' i+1 = Ri and proceed with construction 
of Ri+\. Every time when we read a block of B points, we either process B points in L or construct 
a new ridge. Since the number of ridges is 0(\S\/t) = 0{B l l 3 ) and the number of point in L is 
0(f? 4 / 3 ), the total number of I/Os needed to process L is 0{B 1 ' 3 ). 

When the z-coordinate of a ridge Ri is known, Ri can be constructed in 0(B 1 / 3 ) I/Os. We 
divide the already processed points of L into groups of B points sorted in decreasing order by their 
x-coordinates: all points in a group Gj have larger x-coordinates than points in Gj+\. We can 
obtain all groups Gi in 0(B 1 / 3 ) I/O operations. We also divide the already processed points of L 
into groups Yi of B points sorted by their y-coordinates: all points in a group Y\ have smaller y- 
coordinates than points in Yi+i. Let p be the point on R[ that lies on the (x, z) plane (i.e., p.y = 0). 
We move p in +x direction until p is dominated by It points and identify the starting point of R^, 
this can be done in 0(B 1 / 3 ) I/Os. Suppose that x-coordinates of all points in G\, G2, ■ ■ ■ , Gj-\ are 
greater than p.x. We initialize the variable h to j and the variable v to 1; we read Gh, Y v , and the 
inner corners of R[ into the main memory. Observe that since a ridge has 0(B 1 ^ 3 ) inner corners, 
all inner corners of Ri and R\ can be stored in the main memory. We perform the steps (l)-(3) of 
the loop as long as the x-coordinate of p is greater than or equals to the minimum x-coordinate of 
a point in Gh and the y-coordinate of p is smaller than or equals to the maximum y-coordinate of 
a point in Y v . As long as those conditions are satisfied we can determine the number of points that 
dominate p using G^ and Y v : when a point p is moved in +y direction, the number of points that 
dominate p can be changed only because of points in Y v ; when a point p is moved in — x direction, 
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the number of points that dominate p can be changed only because of points in Gh- If p.y is greater 
than the maximal coordinate in Y v , we read the next Y v+ \ into main memory and increment v by 
1. If p.x is smaller than the minimal coordinate of Gh, then, we read Gh+i and increment h by 1. 
Since there are 0{B 1 / 3 ) groups Gh and Y v , the total number of I/O operations needed to construct 
a ridge is 0(B 1 / 3 ). Since there are 0(B 1 ^ 3 ) ridges, we need 0(B 2 / 3 ) operations to construct all 
ridges. Hence, the total construction cost is 0(B 2 ^ 3 ). □ 

Since there are Od^l) inward corners, we cannot directly store Q(t) points that dominate each 
inward corner. A data structure that uses linear space and reports all points that dominate an 
arbitrary inward corner is described in [14] . We can transform the data structure of [14] into an 
external memory data structure £; in the case when \S\ = 0(-B 4 / 3 ) the data structure £ uses 
0(B 1 ^ 3 ) blocks of space and reports all points that dominate an arbitrary inward corner in 0(J?) 
I/Os. The following lemma shows how we can support batches of range reporting queries on a 
small set. 

Lemma 6 For any c > 3, there exists a data structure for a set of F = 0(B 1+1 / C ) points that 
supports f = F/B range reporting queries in 0(B 1 ^ c +X/B) I/O operations where X = Yl{=i Xi+f 
and Xi is the number of points in the answer to the i-th query. The data structure uses B x l c blocks 
of space and can be constructed in 0(B 1 ^ C ) I/O operations. 

Proof: Suppose that a set A consists of F points. We divide A into F/B = 0(B l ' c ) subsets Ai, 
such that each Ai contain B points. We can read all queries Qj, j = 1, . . . ,F/B, and all points of 
Ai into the main memory with O(l) read operations. Then, we can find all pairs (p,j), such that 
p € Ai n Qj. For all sets Ai, we need 0(B 1 ^ C + ^) read and write operations to produce a list L 
that contains all such pairs. The list L contains all points that belong to Qi, . . . , Qf/b- It remains 
to determine which points belong to which query ranges. Using e.g., the sorting algorithm of [8], 
we sort L by its second component in log^ X) = O(^) operations. Now we can scan the list 
and output all points p that belong to a pair (p, i) as the answer to a query Qi for i = 1, . . . , /. 

□ 

The data structure £ can be constructed by constructing B 1 / 3 instances of Lemma[6]data structures 
and answering batches of range reporting queries [14] , We use three data structures y mm , y max ; 
and X that support three-dimensional queries. All data structures answer B 1 / 3 batches of queries 
(one batch for each ridge), and each batch consists of B 1 / 3 queries. Each batch can be processed in 
0{B 1 / 3 ) I/O operations by LemmaO Hence the data structure £ can be constructed in O(B) I/Os. 
We consider the case when the set of points S contains 0(£? 4 / 3 ) points, and t > B. The algorithm 
presented below and its description are very similar to the algorithm that will be included into the 
full version of [14] . The description in this section is provided only for the sake of completeness. 

The inward corners Cj of V, such that vr(cj) dominates 7r(cj) for some inward corner q, but does 
not dominate the projection on the (x, y)-plane of one of the previously visited inward corners are 
called the children of Cj. A corner Cj is a parent of Cj if Cj is a child of q. Descendants of q are 
children of Cj and descendants of children of Cj. With each inward corner q we associate a list of 
points Dom{ci). A d-neighbor of an inward corner v is one of d preceding or d following corners 
on the same ridge. The dominance list Dom(ci) contains all points p that satisfy the following 
conditions: 

• p dominates Cj 
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• p is not contained in the dominance lists of descendants of Cj 

• p is not contained in the dominance lists of descendants of one of the 3-neighbors of c$ 

• p is not contained in the dominance lists of those 3-neighbors of q that belong to ridges with 
lower z-coordinate 

When dominance lists for all inward corners are constructed, we can output all points that 
dominate any inward corner of a i-approximate boundary in 0{t/B) operations. Details can be 
found in [13]. We will show below how dominance lists can be constructed in external memory with 
help of El 

We construct dominance lists for inward corners of ridges R' g , . . . , R[ , so that ridges are processed 
in the ascending order of their z-coordinates. For each ridge R[ inward corners are processed in the 
ascending order of their x-coordinates. We will use three auxiliary data structures: The static data 
structure X stores all points of S and supports three-dimensional range reporting queries. Data 
structures V ma,x and V mm support updates and three-dimensional dominance reporting queries. At 
the beginning of the algorithm data structures y max and V mm are empty; we will update these 
data structures every time when all inward corners of some ridge R[ are processed. 

Points in the dominance list of an inward corner Cj € R[ can be divided into two groups: 1. 
Points that are stored in a dominance list of some inward corner (s) c s , such that c s is neither a 
3-neighbor of Cj, nor a descendant of Cj or one of its 3-neighbors. 2. "New" points, i.e. points that 
dominate Cj but do not dominate any previously processed inward corner c s . An example is shown 
on Fig. [21 

Points in the first group can be found with help of data structures y max and y mm : when the 
dominance list of an inward corner Cj £ R\ is constructed, V ma,x and V mm contain information 
about all points stored in the dominance lists of inward corners on ridges R' g , R' g+ i, • • • , R'i+i- For 
every point p = (x p , y p , z p ), data structure V mm contains an element (x p ,indi, z p ). Here indi 
denotes the index of the inward corner q, such that p belongs to Dom{ci). If p is stored in the 
dominance lists of more than one inward corner, then we choose the corner q with the highest index. 
Data structure V mm supports queries (a + ,fe~,c + ): report all elements (x p ,ind p , z p ) of V mm such 
that x p > a + , ind p < b~, and z p > c + . Clearly, such queries are equivalent to three-dimensional 
dominance reporting queries. The data structure y max contains a point (x p ,indi, z p ) for every 
point p = (x p ,y p , z p ). Again ind\ is the index of an inward corner c\ with p G Dom(ci), but if p 
is stored in more than one dominance list, we choose the inward corner q with the lowest index. 
ymax SU pp 0r t s queries (a + ,6 + ,c + ): report all elements (x p ,ind p , z p ) of y max such that x p > a + , 
ind p > b + , and z p > c + . 

If a point p belongs to the first group, then it is stored either in a dominance list of a (descendant 
of) d-predecessor of Cj or in a dominance list of a (descendant of) (i-successor of Cj for d > 4. Let 
min, be the minimal index of a descendant of a 3-predecessor of Cj. Let maxj be the maximal 
index of a descendant of a 3-successor of Cj. Let Cj = (xj,yj,Zj) For each point p that dominates 
Cj and is stored in a dominance list of a ci-predecessor of Cj for d > 3, data structure V mm contains 
a point (x p , indi, z p ), such that indi < min,, x p > Xj, and z p > Zj. If for an element (x p , indi, z p ) of 
y mm , indi < minj and x p > Xj, then y p > yj for the corresponding point p: p dominates an inward 
corner that dominates some d-predecessor c p of Cj, and the y-coordinate of c p is greater than the 
y-coordinate of Cj. Hence, every element (x p , indi, z p ) of V mm , such that indi < rninj, x p > Xj, and 
z p > Zj corresponds to a point that dominates ca. All such elements can be found by a query to 

ymin 
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Figure 2: Construction of the dominance list for inward corner A. Points a, b, c, d, e, f are "new" 
points. Point g must be also included into the dominance list. 



For each point p that dominates Cj and is stored in a dominance list of a (i-successor of Cj 
for d > 4, data structure y max contains a point (y p ,indi, z p ), such that ind\ > maxy, y p > yj, 
and z p > Zj. Again, for every (y p ,indi, z p ) such that ind\ > maxj, the x-coordinate x p of the 
corresponding point p is greater than or equal to Xj. Hence, every element (x p ,indi, z p ) of V max , 
such that indi > max,,-, y p > yj, and z p > Zj corresponds to a point that dominates c,-. All such 
elements can be found by a query to V max . 

It remains to add the "new" points to the dominance list of Cj. Suppose that Cj dominates 
inward corners c' l , c%, ■ ■ ■ , c' on the previous ridge R' i+1 and c' r = (x' r ,y' r , z' r ) for r = 1, . . . , q. We 
denote by Cj_i = (xj-i,yj-i, zj) and Cj+i = (xj+i, the 1-predecessor and the 1-successor 

of Cj. Let Xq = Xj, y' = yj-i, x' q+l = Xj + i, y' q+ i = yj- All "new" points can be found with 
q + 1 queries Qi, Q 2 , • • • , Qq+i, where Qi = [x-_ 1; x-] x [y-, y-_J x f^, +oo) (see Fig. [2]). The total 
number of queries that must be answered to find all "new" points for all inward corners of is 
0(rii + rii-i), where is the number of inward corners on a ridge R' { . Since rtj = we can 

answer all queries for a ridge R[ in 0(i? 2//3 ) I/Os by Lemma [6] 

When dominance lists for all inward corners of a ridge R[ are constructed, we update "points" in 
data structures y max and V min . The list L contains points p reported by queries to y max and y mm . 
That is, the list L contains all points p, such that p is stored in the dominance list of some inward 
corner Cf on ridge R'j, j > i and in the dominance list of some inward corner Ch on ridge R[. Let 
min(p) and max(p) denote the minimal and the maximal i, such that p is stored in the dominance 
list of the corner with index i. Using 0(N/B) additional space, we can determine whether a point 
p belongs to the list L and maintain for every point p in L the values min(p) and max(p). 

For each point p = (x p ,y p .z p ) stored in L we proceed as follows: If min(p) < ind\ for the 
corresponding point (x p ,indi, z p ) stored in V max , we delete (x p ,indi, z p ) from y max and insert 
(x p , min(p), z p ) into V max ; if max(p) > indi for the corresponding point (x p ,ind2, z p ) stored in 
F mm , we delete (x p , indi, z p) from V mm and insert (x p , max(p), z p ) into y max For each "new" point 
p, we add (x p , min(p), z p ) to y max and (x p , max(p), z p ) to y mm . Since both V mm and y max consist 
of just of a list of points (see Lemma [6|) and the total number of points in both data structures is 
0(i? 4 / 3 ), we can construct new versions of V mm and y max in 0(B 1 / 3 ) I/Os. 
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Since the total number of ridges is 0(B 1 ^ 3 ), all dominance lists can be constructed in 0(B) 
I/Os. 

6 Conclusion 

In this paper we presented the first dynamic data structure that supports three-dimensional orthog- 
onal range reporting queries in 0(log^ N + K/B) I/O operations. This query cost "matches" the 
query bound of the fastest internal memory data structure. The space usage of our data structure 
is quite comparable with the most space-efficient static data structure [2]. This is an interesting 
open question, whether the 0(log 2 N) update cost can be significantly improved. 

Using our approach, we can also obtain data structures that support special cases of range 
reporting queries; these data structures answer queries in 0(log^ N) I/Os, but use less space and 
support faster update operations than the data structure of Theorem [TJ In particular, we can 
obtain: 

(i) The data structure for (1, 1, l)-sided queries (three-dimensional dominance queries) that uses 
0((N/B) log 2 N) blocks of space and supports updates in 0(log^ N) I/Os. 

(ii) The data structure for (1, l,2)-sided queries that uses 0((N/B) log 2 iVlog 2 B) blocks of space 
and supports updates in 0(log 2 iVlog^ N) I/Os. 

(iii) The data structure for (2, 1, 2)-sided queries that uses 0({N/B) log 2 JVlog| B) blocks of space 
and supports updates in 0(log 2 N) I/Os. 

The data structure (iii) is the result of Lemma [4l We obtain the results (i) and (ii) by replacing 
the data structures D v , D' v , and E v in the proof of Lemma [3] with data structures that support 
(1, 1, l)-sided queries (resp. (1, 1, 2)-sided queries) on a set with 0(S 4 / 3 ) points. Details will be 
given in the full version of this paper. 
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